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ABSTRACT 



Sunspot numbers exhibit large short-timescale (daily-monthly) variation in 
addition to longer timescale variation due to solar cycles. A formal statistical 
framework is presented for estimating and forecasting randomness in sunspot 
numbers on top of deterministic (including chaotic) models for solar cycles. The 
Fokker-Planck approach formulated assumes a specified long-term or secular 
variation in sunspot number over an underlying solar cycle via a driver function. 
The model then describes the observed randomness in sunspot number on top 
of this driver function. We consider a simple harmonic choice for the driver 
function, but the approach is general and can easily be extended to include 
other drivers which account for underlying physical processes and/or empirical 
features of the sunspot numbers. The framework is consistent during both solar 
maximum and minimum, and requires no parameter restrictions to ensure non- 
negative sunspot numbers. Model parameters are estimated using statistically 
optimal techniques. The model agrees both qualitatively and quantitatively with 
monthly sunspot data even with the simplistic representation of the periodic solar 
cycle. This framework should be particularly useful for solar cycle forecasters and 
is complementary to existing modeling techniques. An analytic approximation 
for the Fokker-Planck equation is presented, which is analogous to the Euler 
approximation, which which allows for efficient maximum likelihood estimation 
of large data sets and/or when using difficult to evaluate driver functions. 

Subject headings: (Sun:) sunspots — methods: statistical 
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Introduction 



Sunspots are regions on the solar surface where strong magnetic fields pierce the 
photosphere. Sunspots form when magnetic fiux tubes rise out of the solar interior and 
cross the photosphere. The magnetic field around these fiux tubes is sufficiently strong to 
disrupt the usual process of convective heat transport to the surface, so sunspots appear as 
small dark spots on the photosphere. Areas around sunspots where surface magnetic fields 
are particularly strong are known as active regio ns and host explosive magnetic ev ents 



including solar fiares and coronal mass ejections flTandberg-Hanssen fc Emslie 



19881) 



The daily sunspot number is described using the Wolf number 



s = k{10g + N) 



where q is the number o: 



' sunspot groups and is the number of individual spots 
19771 ). The sunspot number is a weighted average of individual 
spots and groups, and the correction factor k depends on a number of factors, including 



(IBruzack &: Durrant 



the 



ocation of the observatory, instrument parameters, and counting method f lPetrovay 
2010l ). The data used in this paper is a weighted average of measurements from a network 
of observatories (the 'International sunspot number'), produced by the Solar Infiuences 
Data Analysis Center (SIDC), Royal Observatory of Belgiunu. The sunspot number is a 
constructed measure of solar activity and not a physical quantity, and is represented by 
a non-negative number with an important lower limit of zero corresponding to no visible 
active regions. 



The mean sunspot number varies with a semi-regular 11-year cycle ([Parker 



19551), but 



there are large daily, weekly, and yearly fiuctuations on top of this regularity, as well as 



^Sunspot data is provided by the US National Geophysical Data Center (NGDC) at 
ht tp : / / www. ngdc . noaa . gov /stp/space weather . html 
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large variations in the maximum sunspot number in a cycle. The magnetic fields responsible 
for the forma tion of active regions are generated by a dynamo process in the solar interior 



flTobias 



20021 ) ■ In the solar convection zone, the fiow of plasma and magnetic fields is 



turbulent due to the high value of the hydrodynamic Reynolds number ( lOssendrijver 



20031). 



As a result it is difficult to accuratel y model the stren gth, location, and timing of magnetic 



fields appearing at the solar surface (IChoudhuri 



20081 ). A large body of literature suggests 



that the under 



( iLetellier et al 



2006 



solar cycles driven by the 



Hanslmeier fc Brajsa 



dynamo also involve chaotic dynamics 



2010 



Aguirre et al. 



20081). 



Compre hensiv e reviews of tec h nique s for modelling/ 



presented by 



Kanel (120071 ) . IPesnelll (120081 ) . and 



Petrovay 



breca sting solar cycles have been 



(120101 ). Existing methods generally 



use averaged/smoothed sunspot data, which means they desc ribe the under 



ying solar 



20101 ). This is an 



cycle and not short-term fiuctuations in the sunspot number (iPetrovayl 
important difference. Substantial day-to-day fluctuations occur due to both the rapid 
appearance/formation of large active regions and fast development of magnetic structures 
within active regions. These drive extreme space weather events which affect the Earth 



( ICommittee On The Societal fc Economic Impacts Of Severe Space Weather Events! 120081 ). 



On average, the sunspot number will jump by more than 50 in a single day more than 20 
times per solar cycle. The largest single-day jump for the interval 1850-2010 was As = 112, 
which occurred in April 1947. The characteristics of long-term variations in solar cycles 
have been extensively studied, but the distribution of these large short-term fluctuations 
has not. In this paper we introduce a formal statistical framework for modelin g day-to-day 



fluctuations in sunspot number. Our approach is similar to a recent paper by 



Allen fc Hufl 



(120101 ) in that it treats the sunspot numbers as a diffusion process, but it has a number 
of speciflc advantages over this earlier method. First, our model provides a framework 
for combining deterministic (including chaotic) models for secular variation in solar cycles 
with statistical analysis of the sunspot number time series. Hence the framework could in 
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principle incorporate chaotic-oscillator type models to account for pseudo-periodic solar 
cycles underneath short-timescale stochastic fluctuations in sunspot number. Second, the 
model allows model parameters to be estimated from the data using maximum likelihood 
(ML), w hich provides optimal estimates in the sense of efficiency and consistency in large 



samples (IDacunha-Castelle fc Florens-Zmirou 



19861 ) ■ Third, the formulation enforces the 



non-negativity of sunspot number, so that the model is valid during both solar maximum 



and minimum. This avoids ad hoc treatment o 
number, a problem with the 



Allen fc Hufa mm method. 



the boundary condition at zero sunspot 



The layout of this paper is as follows. In section [2] we derive a Fokker-Planck equation 
for the sunspot number distribution, and illustrate the general properties of the equation 
using a toy model with a simple harmonic choice for the periodic driver function. Section 
3.1 summarizes the details of maximum likelihood estimation of diffusion processes. In 
section [3^2] we apply the ML technique to monthly sunspot data for the interval 1975 to 
2009, again using a simple harmonic choice of driver function. The results agree both 
qualitatively and quantitatively with the empirical sunspot distribution. Section H] presents 
an analytic approximation of the Fokker-Planck equation (|8]), which allows for efficient 
maximum likelihood estimation of large data sets and/or when using driver functions which 
are difficult to evaluate. 



2. A Fokker-Planck equation for the sunspot number 

In this section we introduce a continuous-time stochastic model for sunspot number 
using a Fokker-Planck equation. Sunspots form and disappear on the visible solar surface 
continuously, so it is intuitive to represent the sunspot number at time t with a continuous 
variable s{t). Due to complicated physical processes associated with sunspot formation and 
evolution, the sunspot number is uncertain and s{t) is stochastic. As such, we are interested 
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in the time evolution of the probabihty distribution function (pdf ) of the sunspot number 
given an initial sunspot number s{to) = Sq at time to- We denote this conditional pdf 



f{s,t) = f{s,t\so), 



(2) 



where f{s,t)ds is the probabihty that s{t) lies in the range (s, s + ds) at time t given that 
it was initially at sq. 

Long term or secular variation in the sunspot numbers due to the solar cycle is 
represented by a driver function 6(t). This driver function is chosen to reflect underlying 



physical processes and/or em pirical features o: 



dynamo, the Gnevyshev Gap (iGnevyshev 



the solar cycles (e.g. a semi-periodic 



19671 ). the Waldmeier effect (IWaldmeier 



19351) 



asymmetric/chaotic cycles etc). For example, the function 6{t) might be the solution to a 
system of nonlinear differential equations, in which case the model could describe chaotic 
solar cycles. The model presented here does not attempt to account for the solar cycle, 
which must be contained in the choice of a periodic function for 6{t). The model describes 
the randomness on top of this underlying secular variation. 



2.1. Statistics of sunspot data and motivation 

To some insight into the short-term fluctuations in sunspot number on top of the 
solar cycle variation, we consider the empirical distribution of daily sunspot numbers. 
The size of deviations between consecutive observations of the sunspot number data 
\r{t)\ = \s{t) — s(t — At) I, where At is the daily time step in the observations, is a proxy 
for the standard deviation (so that r(t)^ corresponds to the variance) in sunspot number 
at different times during the solar cycle. Figure [1] shows that this quantity increases with 
the solar cycle. The upper panel plots \r(t)\ over the last sixty years. The lower panel is 
a smoothed daily sunspot number time series showing the underlying solar cycle over the 
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same period. A minimal description of this data requires an account of the underlying solar 
cycle (here provided by the driver function 0{t)), as well as a statistical model accounting 
for the observed non-zero variance at zero sunspot number, and the observed increase in 
variance with the amplitude of the underlying solar cycle. 

The model should account for the observed statistical variation in sunspot numbers 
over a cycle. The sunspot number distribution f{s,t) changes significantly during a 
cycle. Figure [2] shows the distribution f{s) of daily sunspot numbers averaged in time 
over sunspot minimums (green), sunspot maximums (red), and the total time-averaged 
sunspot distribution using daily data for the last three complete solar cycles (1975-2006). 
This figure shows that the character of day-to-day fluctuations of the sunspot number is 
starkly different during different phases of the solar cycle. The sunspot number distribution 
during solar minimum (shown in green) is concentrated at zero, and the tail exhibits 
approximate exponential decay. The distribution during solar maximum (shown in red) 
is approximately a positively-skewed Gaussian. The overall time-averaged distribution 
during this period (shown in blue) is dominated by the large number of zero sunspot 
numbers. The time-averaged distribution is affected by the sampling frequency and number 
of cycles included. Section 13.21 shows that the distribution of monthly sunspot numbers 
during 1975-2006 is more strongly bi-modal than the daily distribution shown in Figure O 
The time-averaged distribution of daily sunspot numbers for 1850-2010 is approximately 
exponential. This is due to the large number of zero sunspot numbers (more than 14% of 
days have a zero sunspot number), the large variations in cycle amplitude, and the large 
fluctuations in maximum sunspot number. It is this important daily stochastic variation 
that we are attempting to model. 
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2.2. Derivation of a Fokker-Planck equation 

In this section we derive a particular Fokker-Planck equation appropriate to model the 
sunspot number distribution f{s,t). The total probability 



oo 



f{s,t)ds (3) 

must always be unity, since probability is a conserved quantity. Local conservation of 
probability implies the conservation equation 

df{s,t) _ dF{s,t) 
dt ds 

where F{s,t) is the probability flux. A general form for F{s,t) is 



(4) 



F{s, t) = t)f{s, t) - [a'{s, t)f{s, t)] (5) 



19891 ) 



where fi{s,t) and a'^{s,t) are advection and diffusion terms respectively (iRiskenI 
The advection coefficient represents the deterministic behaviour of the sunspot number 
evolution (i.e. the effect of the underlying solar cycle on sunspot number), and the diffusion 
coefficient represents the short-term stochastic behaviour. 

We assume that there is a delayed response to the driver function 6{t), given by an 
advection term in equation of the form 

fxis,t) = K[9it)-s], (6) 

where 1/k is a lag time between the process of driving and the formation of sunspots. 
When s < 6(t) the advection term is positive and we expect an increase in the sunspot 
number, and vice versa. This choice ensures that the sunspot number remains close to a 
level determined by 0{t). When the lag time is small the sunspot number reacts quickly 
to changes in the driver. The driver 6{t) may be interpreted as a typical sunspot number 
determined by an underlying model for the solar cycle. 
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As discussed in section I2.H a minimal model of sunspot number variance requires 
parameters to describe variance at zero sunspot number, and the increase in variance with 
the increase in sunspot number. Hence we assume that the diffusion depends quadratically 
on the sunspot number s{t): 

a^{s,t) = Po + l3is + /32s\ (7) 

where /3q, /3i and /32 are positive constants. 

With the choices of equations and ([7]) the Fokker-Planck equation for the sunspot 
number distribution is 

= 2^ { [/5o + + P2s'] f{s, t)}-§-J^ m) - s] f{s, t)} (8) 

where 6{t) is a prescribed driver function. The initial condition for the PDE (|8]) is the delta 
function 

f{s,to)=6{s-So), (9) 

which ensures that total probability is conserved at tg- As s — )■ oo we have the 'far field' 
condition 

f{s,t)^0 (10) 

which ensures that very large sunspot numbers are unlikely. The model has only four 
parameters: the mean reversion k; and the three variance terms f3o, Pi and /32- Parameters 
in the driver function 6{t) are external to the model. As discussed in section 12. 1^ we 
consider this to be the minimum number of parameters required for an accurate description 
of sunspot data. The mean reversion represents a time lag in the rise and fall of sunspot 
numbers associated with changes in the underlying solar cycle. The three variance 
parameters represent variance when the sunspot number is zero (one parameter), and the 
increase of the variance with sunspot number (two parameters). 



- 10 - 



To determine the behaviour of f{s,t) at s = we note that the diffusion process which 
underhes the Fo kker-Planck equation (IHl) may exhibit comphcated behaviour near the 



s = boundary (IKarlin Sz Taylor 



19811 ). To describe the sunspot numbers the underlying 
Brownian motion must remain non-negative, but there is a significant probability of 
observing a zero sunspot number. For this reason it is difficult to extend the stochastic 



differential equation formulation of 



Allen fc Hufj (120101 ) to account for both solar maximum 



and minimum without using an ad-hoc treatment of the stochastic process at zero. In the 
Fokker-Planck approach the non-negativity constraint on s{t) means that probability in 
s > cannot move into the region s < and the appropriate boundary condition at s = 
is the zero probability flux condition 



/i(s,t)/(s,t)-^^[o-2(s,t)/(s,t 



0. 



s=0 



Although the choice of equation (ITTl) may appear obvious in the context of the Fokker- 
Planck equation, the formal treatment of the s = boundary presents a problem in the 
stochastic differential equation approach. In the Fokker-Planck equation formulation 
however, this physical constraint is a natural component of the model. There are no 
restrictions on the choice of the driver 6(t), and during estimation there are no restrictions 
on the choice of parameters in fi{s,t). 

The time evolution of the sunspot number distribution is defined by the model given 
by equation ([H]), the initial condition ([H]), the boundary conditions (ITUl) and (ITT]) , and a 
choice for the driver 6{t). For large s the distribution f{s,t) resembles a positively skewed 
Gaussian distribution. Near zero sunspot number the zero-fiux boundary condition causes 
probability to accumulate around s = 0, and /(s,t) often resembles an exponential. The 
response of the sunspot nurnber d istribution to the driver function is determined by the 



characteristics (ILindenbaum 



19961 ) of the Fokker-Planck equation ([8]), which are given by 



-li- 



the ODE 

ds{t) 



K6{t) - /3i - (2/32 + s with s{to) = Sq. (12) 



dt 

The solution to the characteristic ODE fll2p for the initial condition sq is 

s{t) = e-(2/32+-){t-to) / + [^^(^/) _ e^^/'^+'^^^'dt' j . (13) 



to 



2.3. A toy model for a starspot cycle 

We briefly investigate a toy model involving a simple choice for 6(t) to illustrate the 
features of the model. 

Many stars exhibit stellar cycles, so a simple model for the driver function for a stellar 
cycle is the harmonic choice 

9{t) = ao + aism{2'n-t/a2 + a^), (14) 

where a2 and determine the period and phase of the cycles, and ao and ai determine the 
maximum and minimum amplitudes of the driving. We assume that our toy stellar cycle 
involves stochastic formation and decay of starspots on the stellar surface, analogous to the 
Sun. With the choice of equation f ll4p for a driver the solution to the characteristic ODE 
f fT3|) has the form 

sit) = Stran(t) + Spcr(^) (15) 

where 

s,Ut) = {so + ^ pvrcosas - (2/3^ + «) sin as] + ^^^j e'^^^^^^)* (16) 

and 

Sper(t) = Ao + Ai sin (27rt/a2 + A3) , (17) 



-VI- 



with 



D =al (2/32 + t^f + 47r2 



(18) 




UqK — Pi 



(19) 



A 



(20) 



1 



A^ =tan 



-1 



a2 (2/^2 + k) sin — 2tt cos 0:3 



(21) 



«2 (2/32 + ^) COS 03 + 27r sin 03 



The term Stran(^) describes the transient response of the system to the initial condition 
So, and Sper{t) describes the long-term response to the underlying stellar cycle, which is 
represented by the sinusoidal driver function. Specifically, s(t) — )■ Sper(t) as t — 00. 

In equations (fT5 !) -( fT8l) . if k > and /32 > the amplitude of the response is less than 
the amplitude of the driver, with equality achieved in the limiting case where the response 
time 1/k approaches zero. These requirements ensure that the distribution of the sunspot 
number returns to a long-term periodic response to the driver 6{t) regardless of the initial 
condition sq- In general there is a lag between the driver 6{t) and the response of the 
sunspot number, so that the driver and the response are out of phase by 



When 1/k — 7- the response to the driver is instantaneous and the phase of the driver 
and reaction coincide, in which case s(t) = 9{t). To investigate a specific toy model 
numerically we re-scale time by the period of the star's dynamo, and assume our equations 
are non-dimensional. The driver function representing the periodic variation in the stellar 
cycles is 



and the non-dimensional diffusion term representing the stochastic emergence and formation 
of starspots is assumed to be 



A = as - A3. 



(22) 



e{t) = 1 + sin (27rt) , 



(23) 




s,t) = 2 + 0.5s + 0.2sl 



(24) 
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The non-dimensional response time of starspots to the driver is assumed to be 

l//€ = 0.25. (25) 

The Fokker-Planck equation governing the time evolution of the non-dimensional starspot 
number on this star is 

where 9{t), a^{s,t), and n are given by equations (!23|) . (^^, and (p5l) respectively. The 
initial condition is the delta function 

f{s,to) = S{s-so), (27) 

where for simplicity we take to = 0. Equation describes a dynamic system in which 
the starspot number responds to a simple sinusoidal stellar cycle. The resulting emergence 
and formation of starspots is stochastic, and the uncertainty increases with the starspot 
number. The response of the sunspot number distribution to the driver 6{t) is determined 
by the characteristic curves, which are given by equation ([T^ . 

Figure [3] shows the numerical solution of the Fokker-Planck equation (!26|) for the 
toy model with Sq = 2. The driver function (dashed curve) and the characteristic (solid 
curve) are superposed on the contours of the distribution in the s-t plane. The distribution 
exhibits a lag with respect to the driver (given by the angle A = —0.96) and follows the 
characteristic curve from the initial condition sq = 2 to the long-term response described 
by Sper(t) with Aq = 0.80 and amplitude Ai = 0.52. The figure illustrates the 'accumulation 
of probability' about zero starspot number around times of minimum, due to the zero 
probability flux boundary condition at s = 0. The variance in the starspot number increases 
with s{t), and the figure shows that the response of the starspot numbers to the driving is 
more varied around starspot maximum. 
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Parameter estimation 



3.1. Parameter estimation for diffusion processes 



An important advantage of the Fokker-Planck equation formulation presented here is 
that it allows statistically rigorous estimation of model parameters from data. The time 
series of sunspot numbers s = {s (to) , (ti) , . . . , s (t^)} is considered to be a discretely 
observed realisation of the underlying continuous diffusion process. The distribution of s(t) 
at time tt+i is dependent only on the previous observation Si = s{ti) [the Markov property 



( iKaratzas fc Shreve 



19911 )]. The observations are assumed to be generated according to the 
conditional pdf / {s,t\si] ft), which depends on a set of parameters ft we want to estimate 
from the observed sunspot number time series s. The conditional pdf / {s,t\si] O) satisfies 



the Fokker-Planck equation 
a/(s,t|s,;0) 1 



d 



dt 



with initial condition 



and zero flux condition 



2 [^'(^' n)f {s, t\sf, n)] - ^ [/i(s, t; n)f {s, t\sf, n)] (28) 



/ {s,t\si; Q) = 6 {s - Si 



1 d 



(29) 



/i(s, t- n)f{s, t\sf, - 2 ^ [^'(^' ^)/(^' ^1^^; ^ 



The values for the estimated parameters are denoted ft. 



(30) 



s=0 



Maximum likelihood estimates are consi dered optimal in the sense that they are 



19861 ). 



both efficient and consistent in large samples (iDacunha-Castelle fc Florens-Zmirou 
Qualitatively, this means that as the sample size grows, the probability of a maximum 
likelihood estimator being different to the true parameters converges to zero. Also, as the 



sample size grows the variance of the estimator converges to a theoretical minimum value. 
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The likelihood function L for a realisation s is defined as 

i=T 

£(f^|s) := J]/(s,|s,_i;0) 



(31) 



where st is the final observation in the time series s, and the maximum likelihood estimator 



O is the particular which maximises the log-likelihood 

i=T 

log£(0|s) = ^log/(si|s,_i;0). 



(32) 



j=i 



For arbitrary advection and diffusion terms in equation fl28|) and/or difficult boundary 
conditions, general solutions for /(s,t|sj) are unavailable and approximation techniques 
are required. Jensen and Poulsen (2002) found that the most accurate technique for 



to f 



g, t|gn) using Hermite polynomial expansions about a normal distribution flAi't-Sahalia 



1999 



19881). 



20021). foUoAved by direct numerical solution of the Fokker-Planck equation (ILo 



Hurn et al 



(Ait- 


Sa^ 


lalia 


ion ( 


Lo 





( 120071 ) also found that the two most accurate techniques for parameter 
estimation of diffusion processes involved maximum likelihood procedures using these 
approximations for the unknown pdf /(s,t|si). 

In our model, probability accumulates around zero sunspot number at times of 
minimum due to the zero probability flux boundary condition. As a result an approximation 
of /(s,t|so) by an expansion about a no r mal distribu tion is not always accurate. Hence 



we do no use the method of 



Ait-Sahahal ( 11999 



2OO2I ). but instead apply direct numerical 



solution of the Fokker-Planck equation (I2S]) to approximate the unknown pdf /(s,t|sj). 
he numerical solutions are obtained us ing an exponentially-fitted finite difference schem e 



( de Allen fc Southwell 



1955 



Duffv 



20061 ) with Rannacher time stepping (jRannacherlll984l ). 
These numerical solutions of equation (1281) are then used to find the maximum likelihood 
estimates f2 of the parameters f2 of the sunspot number pdf /(s,t|sj). The optimization of 



th e log-hkehhood (|32|) is 



by 



Haupt fc Haupt 



mm . 



performed using a genetic algorithm based on a routine described 
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3.2. Maximum likelihood estimation of the monthly sunspot number 

In this section we apply the model discussed in section [2] to the monthly sunspot 
number time series. To introduce the methodology we use the analysis of the toy model 
in section 12.31 with the sinusoidal driver function f[T^ and apply it to the last three cycles 
of the monthly sunspot numbers (1975-2006). Time is measured in months, and we set 
to = to be January 1975. Despite the simple (harmonic) representation of the periodic 
solar cycle, we achieve both qualitative and quantitative agreement between the model 
distributions and the sunspot data. 

Table 1: Maximum likelihood estimates of the model parameters O using monthly sunspot 
data 1975-2006. 



So 


Si 


S2 


S3 


^0 


A 










month 


rad 


month 


month" ^ 


month ~^ 


month" 


71.04 


-69.38 


123.4 


1.222 


2547 


12.65 


0.53 


6.51 



Table [T] displays the maximum likelihood estimates of the sunspot number parameter 
set ri using the monthly data. Figure H] plots the numerical solution of the Fokker-Planck 
equation ([8]) with the maximum likelihood parameter set from Table [H The initial 
observation sq = 18.9 is for January 1975 and the initial condition is the delta function 
/(s,0|so) = S{s — 18.9). The final observation sitr) = 13.6 is for December 2006. The 
So = 18.9 characteristic curve is shown by a solid line, and the monthly data for 1975 to 
2006 are superposed on the contours of the model sunspot number pdf. The dashed line is 
the expected sunspot number (s(t)), which is defined by 

POO 

{sit))= / s'fis',t)ds'. (33) 
Jo 

The analysis of section 12.31 is appropriate since we are using a harmonic driver. The 
transient term Stran(^) in equation f|T5l) vanishes quickly, and the long-term response of the 
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sunspot number pdf is determined by the periodic term Spcrit). The sunspot number pdf 
fluctuates about the constant Aq = 59.42 and the amplitude of the response Ai = —59.67 is 
smaller than the amplitude of the driver oi = —69.38. There is no noticeable lag (A ^ 0). 
Figure H] demonstrates qualitative agreement between the model and the monthly sunspot 
data, and in particular the shape and time variation of the distribution is consistent with 
the data. The figure illustrates how the characteristic curve determines the long-term 
response to the driver 6{t), and how the sunspot number varies more during solar maximum. 
It also shows the accumulation of probability about zero sunspot number at times of solar 
minimum, matching the observed low sunspot number at those times. 

Figure [5] shows the model sunspot number distribution /(s,tniax) at the maximum of 
cycle 23 (dashed curve), and the distribution /(s,tmin) at the previous minimum (solid 
curve). The distribution at solar maximum is a positively skewed Gaussian. The tail of the 
distribution at solar minimum exhibits exponential decay. The distributions qualitatively 
coincide with the empirical distributions in Figure |2j 

To investigate the statistical agreement between the model and the observations we 
first compare the quantiles of the model and the empirical distribution. The tails of the 
model distribution represent the probability of observing unusually large or small sunspot 
numbers. To quantify the accuracy of the tails of the model we calculate the lower and 
upper a% quantiles SL(t) and su(t) for each month. These quantiles are defined at time t by 



That is, given the initial sunspot number sq = 18.9 for January 1975, the probability 
of observing a sunspot number less than ^^(t) at time t is a%. Table [2] compares the 
proportion of monthly data lying outside the lower and upper a% quantiles of the model pdf 
over the period January 1975 to December 2006 for a = 20%, 10%, 5%, 1% and a = 0.50%. 
Table [2] shows good agreement between the model values and the observations, and confirms 




(34) 
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that the tails of the model sunspot number distribution are accurate over the thirty years. 

Table 2: Comparison of model and observed tail probabilities for the monthly sunspot number 
for 1975-2006. 

Model quantiles a = 20% a = 10% a = 5% a = 1% a = 0.5% 

Observed upper quantiles 18 9.1 4.2 0.54 0.52 

Observer lower quantiles 23 13 5.5 1.0 0.78 



We also investigate the time-averaged behaviour of the sunspot number distribution 
over a number of cycles, and test the quantitative agreement between the model and data 



using a test (jPress et al. 



19921 ). We construct the time-averaged model distribution 



1 



T 



f{s) = - -/ f{s,t')dt' (35) 

It — Co J to 

over the duration of the observations (i.e. = January 1975 to = December 2006). 
This time-averaged model distribution is calculated by integrating the numerical solution 
to equation ([8]) using the ML parameter set in Table [U for the interval January 1975 to 
December 2006. To calculate representative uncertainties from the model distribution we 
let Oi be the number of monthly observations in bin i, and Mj be the number implied by 
the model. The uncertainty in each bin is approximately 



where As = 8.33 is the bin width. Figure [6] compares the time- averaged distribution ( l35l) 
of the model (squares) with a histogram of the monthly sunspot number for the duration. 
The representative model uncertainties af^i in each bin are also shown by the error bars. 
The model distribution reproduces an observed bimodality in the data. The data shows 
peaks at s ~ 10 and s ~ 110, which correspond to the minimum and maximum of the 
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cycles, respectively. A test (jPress et al.lll992l ) is applied, with 



Oi + Mi 



(37) 



The test returns a p value of 0.62, which is not significant. This says that the data cannot 
be excluded given the model, and indicates quantitative agreement between the model and 
data. 



The maximum likelihood procedure outlined in section (13. ip is computationally 
intensive due to the repeated numerical solution of the Fokker-Planck equation ([8]) inside 
the log-likelihood ( l32l) . In this section we briefiy present an analytic approximation which 
allows parameter estimation of large data sets and/or models where the driver function 6{t) 
is difficult to evaluate. A standard approximation is to assume the conditional pdf /(s, t\sq) 
is approximately normal for small r = t — to, so that the log-likelihood can be optimized 
analytically. However, this approximation is not valid for the sunspot model, since it 
would imply, for small Sq, a significant probability of negative sunspot numbers. Instead 
we assume that the advection and diffusion coefficients fi{s,t) and a'^{s,t) are constant for 
small T = t — to and discard the linear terms in the expansions for fi and o"^, in which case 
the Fokker-Planck equation is the constant coefficient advection/ diffusion equation 



4. Approximate solution 





The solution to equation (138|) with a zero probability flux boundary condition at s = 
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IS 



f{s,t\so) 



1 



+ exp 



exp <^ 



{so + fl{so,to)T)Y 

2cr2(so^to)r 



S + {so+ fi{So,to)T)Y 



2a2(so,to)^ 



(39) 



This solution is analogous to the O (-\/r) Euler approximation (IKloeden fc PlatenI 119991 ) 
to the Fokker-Planck equation but with a zero flux boundary condition. Equation ( l39i) 
is the conditional pdf of the random variable \s(t)\ where s{t) is described by a normal 



distribution with mean sq + /i(so,to)T and variance cr^(so,to)T, which we denote 



s{t)\so ~ N [sq + ^{so,tQ)T, a'^{so,to)T] 



(40) 



Equation ( 1391) provides an analytic formula for efficient maximum likelihood estimation 
involving large data sets, or when 6{t) is difficult to evaluate. Equation fHOj) allows 
simulation of solar cycles for a given driver function 6{t). These formulae will be useful in 
the application of the model to forecasting daily sunspot numbers. 



5. Discussion 

This paper presents a framework for modelling randomness on top of deterministic 
models of solar cycles in a statistically optimal way. The Fokker-Planck equation 
formulation allows a general choice of driver function representing the underlying solar 
cycles, and the framework then describes the stochastic variation in the sunspot number 
on top of the (assumed) driver. The approach may be used with a variety of models for 
variation in solar cycles, including those exhibiting nonlinear and chaotic behaviour. The 
model describes a non-negative diffusion process and naturally accounts for the complicated 
behaviour at the lower boundary at zero sunspot number. It is therefore valid and useful 
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during both solar maximum and minimum. As such this framework should be particularly 
useful for solar cycle forecasters and is complementary to existing modeling techniques. 

To introduce the methodology, section [3] assumes a simple harmonic form for the driver 
function for solar cycles during 1975-2006 (cycles 21-23). Despite the simplification in 
the description of the periodic variation the model shows both qualitative and statistical 
agreement with the monthly sunspot data. A ^ test confirms consistency between the 
monthly sunspot data and the model over the three solar cycles. Further, the model tail 
probabilities (quantiles) coincide well with the observed rate of occurrence of large and small 
sunspot numbers. Since forecasters are largely concerned with predicting 'abnormally' large 
events, this is a desirable quality. The success of the model in reproducing the statistics 
of observed sunspot numbers despite the use of a simplistic driver function (which has a 
constant amplitude for three cycles) suggests the importance of short timescale fluctuations 
to the observed statistics. 

The model neglects an explicit account of the drop in sunspot number associated with 
regions rotating off the visible disk. This is a limitation of sunspot data, due to a lack of 
observations for the reverse side of the sun. There is no difficult in principle with using data 
limited in this way: the Fokker-Planck modeling includes this (unphysical) variation in the 
observed statistics. In the future a sunspot number for both hemispheres may be available, 
and the model may be applied to the improved data. 

The motivation for this model is to provide a statistical description of the large, 
short-time scale fluctuations in sunspot number, which are important because of the space 
weather effects produced by large, complex sunspot groups, which may form and evolve 
rapidly. This paper has focused on the motivation and formulation of the model, and has 
demonstrated its ability to reproduce observed sunspot statistics. In future work we will 
apply the model in more detail to historical sunspot data and illustrate the utility of the 
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model for forecasting purposes, in particular prediction of cycle 24, the new solar cycle. 
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criticisms which have improved the paper. Patrick Noble gratefully acknowledges the 
receipt of a School of Physics Denison Postgraduate Scholarship. 
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Fig. 1. — Figure showing how the the variance in sunspot numbers increases with sunspot 
number. The upper panel shows the absolute deviations \r(t)\ = \s(t) — s(t — At)\ between 
consecutive daily sunspot numbers over the last 60 years. The lower panel is a smoothed 
sunspot number time series showing the underlying solar cycle. A minimal model for short- 
term fluctuations in s{t) must describe the non-zero variance at zero sunspot number, and 
the observed increase in variance with sunspot number. 
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Fig. 2. — Figure showing the time-averaged daily sunspot number distribution f{s) at times 
of solar minimum (green), solar maximum (red), and the overall time-averaged sunspot 
distribution (blue) for the last three complete solar cycles (1975-2006). The distribution is 
concentrated around zero during solar minimum, and is approximately a positively skewed 
Gaussian distribution during solar minimum. The overall distribution for the three cycles is 
dominated by the large number of days with zero sunspot number. 
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Fig. 3. — A toy model for a stellar cycle, showing the time evolution of the starspot number 
probability distribution function (pdf) f{s,t\so) starting from an initial number sq = 2. The 
pdf /(s, t\so) is shown by the contours. The distribution follows the characteristic curve (solid 
line) which is the response to the harmonic driver function (dashed line). The distribution 
responds to the initial condition Sq = 2 before approaching the long-term periodic response 
given by Spcr{t). The variance of f{s,t\so) is greater during stellar maximum. Probability 
'accumulates' around zero starspot number during the stellar minimum due to the zero 
probability flux boundary condition at s = 0. 
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Monthly sunspot number s{t) 

Fig. 4. — Contour plot of the monthly sunspot number distribution found by solving the 
Fokker-Planck equation (|8]) for the International sunspot data using the estimated parame- 
ters in Tabled! The initial sunspot number is sq = 18.9 for January 1975, and the 1975-2006 
monthly sunspot data (asterisks) is superposed on the contours of the model distribution. 
The response to the driver 6{t) is given by the characteristic curve (solid), and the expected 
sunspot number {s{t)) is the dashed line. The contours provide a visual representation of 
the shape of the distribution. Table |2] shows that the observed incidence of large sunspot 
numbers agrees with the tails of the model distribution very accurately. The amplitude of 
the response Ai = —59.67 is smaller than the amplitude of the driver ai = —69.38, and 
there is no noticeable lag. This plot demonstrates qualitative agreement between the model 
sunspot number distribution and the monthly data for the years 1975 to 2006. 
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Fig. 5. — Plot of the model sunspot number distribution f{s,tmax) at the maximum of cycle 
23, and the distribution /(s, tmin) at the previous minimum (dashed curve). The distribution 
at solar maximum is a positively skewed Gaussian. The tail of the distribution at solar 
minimum exhibits exponential decay. The model distributions qualitatively coincide with 
the empirical distributions shown in Figure |2j 
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Fig. 6. — Plot of the model (squares) and empirical (histogram) time-averaged distributions 
for the 1976-2006 monthly sunspot data. Representative uncertainties are shown on the 
model values. The time-averaged distribution f{s) is bimodal, with modes for both the 
model and data located at s ~ 10 and s ~ 110. A test indicates that the difference between 
the model and empirical distributions is not significant. This demonstrates quantitative 
agreement between the model sunspot number distribution and the monthly data for the 
years 1976-2006. 
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